Broom: Sweeping Out Garbage Collection from Big Data Systems

نویسندگان

  • Ionel Gog
  • Jana Giceva
  • Malte Schwarzkopf
  • Kapil Vaswani
  • Dimitrios Vytiniotis
  • G. Ramalingam
  • Manuel Costa
  • Derek Gordon Murray
  • Steven Hand
  • Michael Isard
چکیده

Many popular systems for processing “big data” are implemented in high-level programming languages with automatic memory management via garbage collection (GC). However, high object churn and large heap sizes put severe strain on the garbage collector. As a result, applications underperform significantly: GC increases the runtime of typical data processing tasks by up to 40%. We propose to use region-based memory management instead of GC in distributed data processing systems. In these systems, many objects have clearly defined lifetimes. Hence, it is natural to allocate these objects in fate-sharing regions, obviating the need to scan a large heap. Regions can be memory-safe and could be inferred automatically. Our initial results show that regionbased memory management reduces emulated Naiad vertex runtime by 34% for typical data analytics jobs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reducing Sweep Time for a Nearly Empty Heap

Mark and sweep garbage collectors are known for using time proportional to the heap size when sweeping memory, since all objects in the heap, regardless of whether they are live or not, must be visited in order to reclaim the memory occupied by dead objects. This paper introduces a sweeping method which traverses only the live objects, so that sweeping can be done in time dependent only on the ...

متن کامل

The Active Memory Module: Bounded Garbage Collection Latency for Real-time Embedded Systems

In this paper, the performance analysis of the proposed Active Memory Module (AMM) for embedded systems is presented. Unlike the software counterparts, the AMM can perform a memory allocation in a predictable and bounded fashion (5 cycles). Moreover, it can also yield a bounded sweeping time regardless of the number of live objects or heap size. By utilizing the proposed system, the overall spe...

متن کامل

Yak: A High-Performance Big-Data-Friendly Garbage Collector

Most “Big Data” systems are written in managed languages, such as Java, C#, or Scala. These systems suffer from severe memory problems due to the massive volume of objects created to process input data. Allocating and deallocating a sea of data objects puts a severe strain on existing garbage collectors (GC), leading to high memory management overheads and reduced performance. This paper descri...

متن کامل

Active Memory Processor: A Hardware Garbage Collector for Real-Time Java Embedded Devices

Java possesses many advantages for embedded system development, including fast product deployment, portability, security, and a small memory footprint. As Java makes inroads into the market for embedded systems, much effort is being invested in designing real-time garbage collectors. The proposed garbage-collected memory module, a bitmap-based processor with standard DRAM cells is introduced to...

متن کامل

How Data Volume Affects Spark Based Data Analytics on a Scale-up Server

Sheer increase in volume of data over the last decade has triggered research in cluster computing frameworks that enable web enterprises to extract big insights from big data. While Apache Spark is gaining popularity for exhibiting superior scale-out performance on the commodity machines, the impact of data volume on the performance of Spark based data analytics in scale-up configuration is not...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015